Goto

Collaborating Authors

 thermal camera


UrbanIng-V2X: ALarge-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception

Neural Information Processing Systems

Recent cooperative perception datasets have played a crucial role in advancing smart mobility applications by enabling information exchange between intelligent agents, helping to overcome challenges such as occlusions and improving overall scene understanding. While some existing real-world datasets incorporate both vehicle-to-vehicle and vehicle-to-infrastructure interactions, they are typically limited to a single intersection or a single vehicle. A comprehensive perception dataset featuring multiple connected vehicles and infrastructure sensors across several intersections remains unavailable, limiting the benchmarking of algorithms in diverse traffic environments. Consequently, overfitting can occur, and models may demonstrate misleadingly high performance due to similar intersection layouts and traffic participant behavior. To address this gap, we introduce UrbanIng-V2X, the first large-scale, multi-modal dataset supporting cooperative perception involving vehicles and infrastructure sensors deployed across three urban intersections in Ingolstadt, Germany. UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences, each lasting 20 seconds. All sequences contain recordings from one of three intersections, involving two vehicles and up to three infrastructure-mounted sensor poles operating in coordinated scenarios. In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs. All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes, resulting in approximately 712k annotated instances across the dataset.


Towards Autonomous Robotic Electrosurgery via Thermal Imaging

arXiv.org Artificial Intelligence

Electrosurgery is a surgical technique that can improve tissue cutting by reducing cutting force and bleeding. However, electrosurgery adds a risk of thermal injury to surrounding tissue. Expert surgeons estimate desirable cutting velocities based on experience but have no quantifiable reference to indicate if a particular velocity is optimal. Furthermore, prior demonstrations of autonomous electrosurgery have primarily used constant tool velocity, which is not robust to changes in electrosurgical tissue characteristics, power settings, or tool type. Thermal imaging feedback provides information that can be used to reduce thermal injury while balancing cutting force by controlling tool velocity. We introduce Thermography for Electrosurgical Rate Modulation via Optimization (ThERMO) to autonomously reduce thermal injury while balancing cutting force by intelligently controlling tool velocity. We demonstrate ThERMO in tissue phantoms and compare its performance to the constant velocity approach. Overall, ThERMO improves cut success rate by a factor of three and can reduce peak cutting force by a factor of two. ThERMO responds to varying environmental disturbances, reduces damage to tissue, and completes cutting tasks that would otherwise result in catastrophic failure for the constant velocity approach.


Real-time Deer Detection and Warning in Connected Vehicles via Thermal Sensing and Deep Learning

arXiv.org Artificial Intelligence

Deer-vehicle collisions represent a critical safety challenge in the United States, causing nearly 2.1 million incidents annually and resulting in approximately 440 fatalities, 59,000 injuries, and 10 billion USD in economic damages. These collisions also contribute significantly to declining deer populations. This paper presents a real-time detection and driver warning system that integrates thermal imaging, deep learning, and vehicle-to-everything communication to help mitigate deer-vehicle collisions. Our system was trained and validated on a custom dataset of over 12,000 thermal deer images collected in Mars Hill, North Carolina. Experimental evaluation demonstrates exceptional performance with 98.84 percent mean average precision, 95.44 percent precision, and 95.96 percent recall. The system was field tested during a follow-up visit to Mars Hill and readily sensed deer providing the driver with advanced warning. Field testing validates robust operation across diverse weather conditions, with thermal imaging maintaining between 88 and 92 percent detection accuracy in challenging scenarios where conventional visible light based cameras achieve less than 60 percent effectiveness. When a high probability threshold is reached sensor data sharing messages are broadcast to surrounding vehicles and roadside units via cellular vehicle to everything (CV2X) communication devices. Overall, our system achieves end to end latency consistently under 100 milliseconds from detection to driver alert. This research establishes a viable technological pathway for reducing deer-vehicle collisions through thermal imaging and connected vehicles.


ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation

arXiv.org Artificial Intelligence

Autonomous systems rely on sensors to estimate the environment around them. However, cameras, LiDARs, and RADARs have their own limitations. In nighttime or degraded environments such as fog, mist, or dust, thermal cameras can provide valuable information regarding the presence of objects of interest due to their heat signature. They make it easy to identify humans and vehicles that are usually at higher temperatures compared to their surroundings. In this paper, we focus on the adaptation of thermal cameras for robotics and automation, where the biggest hurdle is the lack of data. Several multi-modal datasets are available for driving robotics research in tasks such as scene segmentation, object detection, and depth estimation, which are the cornerstone of autonomous systems. However, they are found to be lacking in thermal imagery. Our paper proposes a solution to augment these datasets with synthetic thermal data to enable widespread and rapid adaptation of thermal cameras. We explore the use of conditional diffusion models to convert existing RGB images to thermal images using self-attention to learn the thermal properties of real-world objects.


The MOTIF Hand: A Robotic Hand for Multimodal Observations with Thermal, Inertial, and Force Sensors

arXiv.org Artificial Intelligence

Advancing dexterous manipulation with multi-fingered robotic hands requires rich sensory capabilities, while existing designs lack onboard thermal and torque sensing. In this work, we propose the MOTIF hand, a novel multimodal and versatile robotic hand that extends the LEAP hand by integrating: (i) dense tactile information across the fingers, (ii) a depth sensor, (iii) a thermal camera, (iv), IMU sensors, and (v) a visual sensor. The MOTIF hand is designed to be relatively low-cost (under 4000 USD) and easily reproducible. We validate our hand design through experiments that leverage its multimodal sensing for two representative tasks. First, we integrate thermal sensing into 3D reconstruction to guide temperature-aware, safe grasping. Second, we show how our hand can distinguish objects with identical appearance but different masses - a capability beyond methods that use vision only.


Google Wants to Get Better at Spotting Wildfires From Space

WIRED

Google wants to launch a battalion of satellites into orbit around the Earth to monitor fires on the ground in real time, then collect all that photographic data and use AI to better identify fires in their critical early stages. Fire Sat is a partnership between Google, the nonprofit Earth Fire Alliance, and the satellite builder Muon Space. The collaborative effort was announced in 2024 with the goal of launching satellites specifically designed to spot wildfires. The first satellite of the proposed 50-plus strong constellation launched in March 2025. The group hopes to get the full constellation up there by 2029.


Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions

arXiv.org Artificial Intelligence

Despite significant progress in autonomous navigation, a critical gap remains in ensuring reliable localization in hazardous environments such as tunnels, urban disaster zones, and underground structures. Tunnels present a uniquely difficult scenario: they are not only prone to GNSS signal loss, but also provide little features for visual localization due to their repetitive walls and poor lighting. These conditions degrade conventional vision-based and LiDAR-based systems, which rely on distinguishable environmental features. To address this, we propose a novel sensor fusion framework that integrates a thermal camera with a LiDAR to enable robust localization in tunnels and other perceptually degraded environments. The thermal camera provides resilience in low-light or smoke conditions, while the LiDAR delivers precise depth perception and structural awareness. By combining these sensors, our framework ensures continuous and accurate localization across diverse and dynamic environments. We use an Extended Kalman Filter (EKF) to fuse multi-sensor inputs, and leverages visual odometry and SLAM (Simultaneous Localization and Mapping) techniques to process the sensor data, enabling robust motion estimation and mapping even in GNSS-denied environments. This fusion of sensor modalities not only enhances system resilience but also provides a scalable solution for cyber-physical systems in connected and autonomous vehicles (CAVs). To validate the framework, we conduct tests in a tunnel environment, simulating sensor degradation and visibility challenges. The results demonstrate that our method sustains accurate localization where standard approaches deteriorate due to the tunnels featureless geometry. The frameworks versatility makes it a promising solution for autonomous vehicles, inspection robots, and other cyber-physical systems operating in constrained, perceptually poor environments.


Thermal Vision: Pioneering Non-Invasive Temperature Tracking in Congested Spaces

arXiv.org Artificial Intelligence

Non-invasive temperature monitoring of individuals plays a crucial role in identifying and isolating symptomatic individuals. Temperature monitoring becomes particularly vital in settings characterized by close human proximity, often referred to as dense settings. However, existing research on non-invasive temperature estimation using thermal cameras has predominantly focused on sparse settings. Unfortunately, the risk of disease transmission is significantly higher in dense settings like movie theaters or classrooms. Consequently, there is an urgent need to develop robust temperature estimation methods tailored explicitly for dense settings. Our study proposes a non-invasive temperature estimation system that combines a thermal camera with an edge device. Our system employs YOLO models for face detection and utilizes a regression framework for temperature estimation. We evaluated the system on a diverse dataset collected in dense and sparse settings. Our proposed face detection model achieves an impressive mAP score of over 84 in both in-dataset and cross-dataset evaluations. Furthermore, the regression framework demonstrates remarkable performance with a mean square error of 0.18$^{\circ}$C and an impressive $R^2$ score of 0.96. Our experiments' results highlight the developed system's effectiveness, positioning it as a promising solution for continuous temperature monitoring in real-world applications. With this paper, we release our dataset and programming code publicly.


Rotational Odometry using Ultra Low Resolution Thermal Cameras

arXiv.org Artificial Intelligence

This letter provides what is, to the best of our knowledge, a first study on the applicability of ultra-low-resolution thermal cameras for providing rotational odometry measurements to navigational devices such as rovers and drones. Our use of an ultra-low-resolution thermal camera instead of other modalities such as an RGB camera is motivated by its robustness to lighting conditions, while being one order of magnitude less cost-expensive compared to higher-resolution thermal cameras. After setting up a custom data acquisition system and acquiring thermal camera data together with its associated rotational speed label, we train a small 4-layer Convolutional Neural Network (CNN) for regressing the rotational speed from the thermal data. Experiments and ablation studies are conducted for determining the impact of thermal camera resolution and the number of successive frames on the CNN estimation precision. Finally, our novel dataset for the study of low-resolution thermal odometry is openly released with the hope of benefiting future research.


Comparing Surface Landmine Object Detection Models on a New Drone Flyby Dataset

arXiv.org Artificial Intelligence

Landmine detection using traditional methods is slow, dangerous and prohibitively expensive. Using deep learning-based object detection algorithms drone videos is promising but has multiple challenges due to the small, soda-can size of recently prevalent surface landmines. The literature currently lacks scientific evaluation of optimal ML models for this problem since most object detection research focuses on analysis of ground video surveillance images. In order to help train comprehensive models and drive research for surface landmine detection, we first create a custom dataset comprising drone images of POM-2 and POM-3 Russian surface landmines. Using this dataset, we train, test and compare 4 different computer vision foundation models YOLOF, DETR, Sparse-RCNN and VFNet. Generally, all 4 detectors do well with YOLOF outperforming other models with a mAP score of 0.89 while DETR, VFNET and Sparse-RCNN mAP scores are all around 0.82 for drone images taken from 10m AGL. YOLOF is also quicker to train consuming 56min of training time on a Nvidia V100 compute cluster. Finally, this research contributes landmine image, video datasets and model Jupyter notebooks at https://github.com/UnVeilX/ to enable future research in surface landmine detection.